Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 1 de 1
Filtrar
Añadir filtros

Base de datos
Tipo del documento
Intervalo de año
1.
5th International Workshop on Health Intelligence, W3PHAI 2021 held in conjection with 35th AAAI Conference on Artificial Intelligence, AAAI 2021 ; 1013:101-111, 2022.
Artículo en Inglés | Scopus | ID: covidwho-1777636

RESUMEN

Surveillance of open-source media, such as social media, has become an essential complement to traditional surveillance data for quickly detecting changes in the occurrence of diseases in time and space. We present our method for classifying Tweets into narratives about COVID-19 symptoms to produce a dataset for downstream surveillance applications. A dataset of 10,405 tweets has been manually classified as relevant or not to self-reported symptoms of COVID-19. Five machine learning classification algorithms, with different tokenization methods, were trained on the dataset and tested. The Support vector machine (SVM) algorithm, with a term frequency-inverse document frequency (TF-IDF) 3-4 n-grams on character as the tokenization method, was the classification algorithm with the highest F1-score of 0.70. However, the training dataset showed an imbalanced classification problem. To reduce the bias of the imbalance classes, the crowdsourcing website Mechanical Turk was used to add 133 relevant tweets. This addition improved the F1-score from 0.70 to 0.77. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA